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We consider nonparametric estimation of the mean and covari- 
ance functions for functional/longitudinal data. Strong uniform con- 
vergence rates are developed for estimators that are local-linear smooth- 
ers. Our results are obtained in a unified framework in which the 
number of observations within each curve/cluster can be of any rate 
relative to the sample size. We show that the convergence rates for 
the procedures depend on both the number of sample curves and the 
number of observations on each curve. For sparse functional data, 
these rates are equivalent to the optimal rates in nonparametric re- 
gression. For dense functional data, root-n rates of convergence can 
be achieved with proper choices of bandwidths. We further derive 
almost sure rates of convergence for principal component analysis 
using the estimated covariance function. The results are illustrated 
with simulation studies. 

1. Introduction. Estimating the mean and covariance functions are es- 
sential problems in longitudinal and functional data analysis. Many recent 
papers focused on nonparametric estimation so as to model the mean and 
covariance structures flexibly. A partial list of such work includes Ramsay 
and Silverman (2005), Lin and Carroll (2000), Wang (2003), Yao, Miiller 
and Wang (2005a, 2005b), Yao and Lee (2006) and Hall, Miiller and Wang 
(2006). 

On the other hand, functional principal component analysis (FPCA) 
based on nonparametric covariance estimation has become one of the most 
common dimension reduction approaches in functional data analysis. Ap- 
plications include temporal trajectory interpolation [Yao, Miiller and Wang 
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(2005a)], functional generalized linear models [Miiller and Stadtmuller (2005) 
and Yao, Miiller and Wang (2005b)] and functional sliced inverse regression 
[Ferre and Yao (2005), Li and Hsing (2010)], to name a few. A number 
of algorithms have been proposed for FPCA, some of which are based on 
spline smoothing [James, Hastie and Sugar (2000), Zhou, Huang and Car- 
roll (2008)] and others based on kernel smoothing [Yao, Miiller and Wang 
(2005a), Hall, Miiller and Wang (2006)]. As usual, large-sample theories can 
provide a basis for understanding the properties of these estimators. So far, 
the asymptotic theories for estimators based on kernel smoothing or local- 
polynomial smoothing are better understood than those based on spline 
smoothing. 

Some definitive theoretical findings on FPCA emerged in recent years. In 
particular, Hall and Hosseini-Nasab (2006) proved various asymptotic ex- 
pansions for FPCA for densely recorded functional data, and Hall, Miiller 
and Wang (2006) established the optimal I? convergence rate for FPCA 
in the sparse functional data setting. One of the most interesting findings 
in Hall, Miiller and Wang (2006) was that the estimated eigenfunctions, 
although computed from an estimated two-dimensional surface, enjoy the 
convergence rate of one-dimensional smoothers, and under favorable condi- 
tions the estimated eigenvalues are root-n consistent. In contrast with the L 2 
convergence rates of these nonparametric estimators, less is known in term 
of uniform convergence rates. Yao, Miiller and Wang (2005a) studied the 
uniform consistency of the estimated mean, covariance and eigenfunctions, 
and demonstrated that such uniform convergence properties are useful in 
many settings; some other examples can also be found in Li et al. (2008). 

In classical nonparametric regression where observations are independent, 
there are a number of well-known results concerning the uniform conver- 
gence rates of kernel-based estimators. Those include Bickel and Rosenblatt 
(1973), Hardle, Janssen and Serfling (1988) and Hiirdle (1989). More re- 
cently, Claeskens and Van Keilegom (2003) extended some of those results 
to local likelihood estimators and local estimating equations. However, as 
remarked in Yao, Miiller and Wang (2005a), whether those optimal rates 
can be extended to functional data remains unknown. 

In a typical functional data setting, a sample of n curves are observed at 
a set of discrete points; denote by rrii the number of observations for curve i. 
The existing literature focuses on two antithetical data types: the first one, 
referred to as dense functional data, is the case where each rrii is larger than 
some power of n; the second type, referred to as sparse functional data, is the 
situation where each rrii is bounded by a finite positive number or follows a 
fixed distribution. The methodologies used to treat the two situations have 
been different in the literature. For dense functional data, the conventional 
approach is to smooth each individual curve first before further analysis; see 
Ramsay and Silverman (2005), Hall, Miiller and Wang (2006) and Zhang 
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and Chen (2007). For sparse functional data, limited information is given by 
the sparsely sampled observations from each individual curve and hence it is 
essential to pool the data in order to conduct inference effectively; see Yao, 
Miiller and Wang (2005a) and Hall, Miiller and Wang (2006). However, in 
practice it is possible that some sample curves are densely observed while 
others are sparsely observed. Moreover, in dealing with real data, it may 
even be difficult to classify which scenario we are faced with and hence to 
decide which methodology to use. 

This paper is aimed at resolving the issues raised in the previous two 
paragraphs. The precise goals will be stated after we introduce the notation 
in Section 2. In a nutshell, we will consider uniform rates of convergence 
of the mean and the covariance functions, as well as rates in the ensuing 
FPCA, using local-linear smoothers [Fan and Gijbels (1995)]. The rates that 
we obtain will address all possible scenarios of the mi's, and we show that 
the optimal rates for dense and sparse functional data can be derived as 
special cases. 

This paper is organized as follows. In Section 2, we introduce the model 
and data structure as well as all of the estimation procedures. We describe 
the asymptotic theory of the procedures in Section 3, where we also discuss 
the results and their connections to prominent results in the literature. Some 
simulation studies are provided in Section 4, and all proofs are included in 
Section 5. 

2. Model and methodology. Let {X(t),t 6 [a, b]} be a stochastic process 
defined on a fixed interval [a,b]. Denote the mean and covariance function 
of the process by 

/i(t) = E{X(t)}, R(s, t) = cov{X(s),X(t)}, 

which are assumed to exist. Except for smoothness conditions on fj, and R, 
we do not impose any parametric structure on the distribution of X. This 
is a commonly considered situation in functional data analysis. 
Suppose we observe 

Yij =Xi(Tij) + Uij, i = 1, ...,n,j = 1,... ,m i; 

where the X^s are independent realizations of X, the Ty's are random 
observational points with density function /t( - )j an d the Uij's are identically 
distributed random errors with mean zero and finite variance a 2 . Assume 
that the X^s, 2i,-'s and U^s are all independent. Assume that mi > 2 and 
let Ni = miirrii — 1). 

Our approach is based on the local-linear smoother; see, for example, 
Fan and Gijbels (1995). Let K(-) be a symmetric probability density func- 
tion on [0, 1] and Kh(t) = (l/h)K(t/h) where h is bandwidth. A local-linear 
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estimator of the mean function is given by ju(t) = oq, where 
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To estimate <7 2 , we first estimate V(t) := C(t, t) + a 2 by V(t) = 2o, where 

n rrn 

(o ,ai) = argmin - V] — E^i - «o - ai(?ij - t)} 2 K hv (Tij - t). 
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As in (2.1), 
(2.4) 
where 

^ = - E — E - t)m - t)/h v } r Y 2 . 
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We then estimate <r 2 by 

a 2 = -^ [\v(t)-C(t,t)}dt. 

For the problem of mean and covariance estimation, the literature has 
focused on dense and sparse functional data. The sparse case roughly refers 
to the situation where each rrii is essentially bounded by some finite num- 
ber M. Yao, Miiller and Wang (2005a) and Hall, Miiller and Wang (2006) 
considered this case and also used local-linear smoothers in their estimation 
procedures. The difference between the estimators in (2.1), (2.3) and those 
considered in Yao, Miiller and Wang (2005a) and Hall, Miiller and Wang 
(2006) is essentially that we attach weights, nh and Nf , to each curve i 
in the optimizations [although Yao, Miiller and Wang (2005a) smoothed the 
residuals in estimating R] . One of the purposes of those weights is to ensure 
that the effect that each curve has on the optimizers is not overly affected 
by the denseness of the observations. 

Dense functional data roughly refer to data for which each rrii > M n — > oo 
for some sequence M n , where specific assumptions on the rate of increase of 
M n are required for this case to have a distinguishable asymptotic theory in 
the estimation of the mean and covariance. Hall, Miiller and Wang (2006) 
and Zhang and Chen (2007) considered the so-called "smooth-first-then- 
estimate" approach, namely, the approach that first preprocesses the discrete 
functional data by smoothing, and then adopts the empirical estimators of 
the mean and covariance based on the smoothed functional data. See also 
Ramsay and Silverman (2005). 

As will be seen, our approach is suitable for both sparse and dense func- 
tional data. Thus, one particular advantage is that we do not have to dis- 
cern data type — dense, sparse or mixed — and decide which methodology 
should be used accordingly. In Section 3, we will provide the convergence 
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rates of ju(t), R(s,t) and a 2 , and also those of the estimated eigenvalues and 
eigenfunctions of the covariance operator of X. The novelties of our results 
include: 

(a) Almost-sure uniform rates of convergence for j2(t) and R(s,t) over the 
entire range of s,t will be proved. 

(b) The sample sizes rrii per curve will be completely flexible. For the special 
cases of dense and sparse functional data, these rates match the best 
known/conjectured rates. 

3. Asymptotic theory. To prove a general asymptotic theory, assume 
that rrii may depend on n as well, namely, mj = mj n . However, for simplicity 
we continue to use the notation rrii. Define 

lnk=[n~ l *Y^ m ^^j ' k = l,2,..., 

which is the kth order harmonic mean of {rrii}, and for any bandwidth h, 
5 nl (h) = [{l + (h lnl )- 1 }logn/n} 1 / 2 

and 

Sn2(h) = [{1 + (/17m)" 1 + (h^r'jlogn/n} 1 / 2 . 

We first state the assumptions. In the following h^,hn and hy are band- 
widths, which are assumed to change with n. 

(CI) For some constants rriT > and My < oo, mj- < /t(£) < Mt for all 
t S [a, b}. Further, fx is differentiable with a bounded derivative. 

(C2) The kernel function K{-) is a symmetric probability density func- 
tion on [—1,1], and is of bounded variation on [—1,1]. Denote v<i = 
j\t 2 K{t)dt. 

(C3) /i(-) is twice differentiable and the second derivative is bounded on 
[a, b). 

(C4) All second-order partial derivatives of R(s,t) exist and are bounded 
on [a,b] 2 . 

(C5) E(|E/y| A ") < oo and E(supi eM |X(t)| A ") < oo for some A M G (2, oo); 

/i M — > and (/i 2 + /i M /7 n i)~ 1 (logn/n) 1 ~ 2 / A ^ 1 -)>0 asn->oo. 
(C6) E(\Uij\ 2XR ) < oo and E(sup teM \X(t)\ 2XR ) < oo for some X R E (2,oo); 

h R and (h% + /i|/7ni + /iy7n2)~ 1 ( 1 og«/ ra ) 1 " 2/Aii as n ^ oo. 
(C7) E(|t/ij| 2A ^) < oo and E(sup tgM \X{t)\ 2Xv ) < oo for some X v G (2,oo); 

fty->0 and (hy + V/Vi) -1 ^™/™) 1 " 2 ^ 7 -»• as n — > oo . 

The moment condition E(sup tg [ afe ] |X(i)| A ) < oo in (C5)-(C7) hold rather 
generally; in particular, it holds for Gaussian processes with continuous sam- 
ple paths [cf. Landau and Shepp (1970)] for all A > 0. This condition was 
also adopted by Hall, Muller and Wang (2006). 



UNIFORM CONVERGENCE RATES FOR FUNCTIONAL DATA 



7 



3.1. Convergence rates in mean estimation. The convergence rate of ju(t) 
is given in the following result. 

Theorem 3.1. Assume that (C1)-(C3) and (C5) hold. Then 

(3.1) sup |£(t) - n{t)\ = 0{hl + a.s. 

te[a,b] 

The following corollary addresses the special cases of sparse and dense 
functional data. For convenience, we use the notation a n < b n to mean a n = 
0(b n ). 

Corollary 3.2. Assume that (C1)-(C3) and (C5) hold. 

(a) i/maxi<j< n mj < M for some fixed M, then 

(3.2) sup \fl(t) - fi(t)\ = 0[hl + {logn/^)} 1 / 2 ) a.s. 

te[a,b] 

(b) // mini<j< n mj > M n for some sequence M n where M~ l < 
hn < (log n/n) 1//4 is bounded away from 0, then 

sup \jl{t) -fi(t)\ =0({log?Vn} 1/2 ) a.s. 
te[a,b] 

The proofs of Theorem 3.1, as the proofs of other results, will be given in 
Section 5. First, we give a few remarks on these results. 

Discussion. 

1. On the right-hand side of (3.1), 0{h 2 ^) is a bound for bias while 5 n i{hfj) 
is a bound for sup tG j a fe ]|/i(t) — E(//(i))|. The derivation of the bias is easy 
to understand and is essentially the same as in classical nonparametric 
regression. The derivation of the second bound is more involved and rep- 
resents our main contribution in this result. To obtain a uniform bound 
for |/2(f) — E(ju(t))| over [a,b], we first obtained a uniform bound over 
a finite grid on [a, 6], where the grid grows increasingly dense with n, 
and then showed that the difference between the two uniform bounds is 
asymptotic negligible. This approach was inspired by Hardle, Janssen and 
Serfling (1988), which focused on nonparametric regression. One of the 
main difficulties in our result is that we need to deal within-curve depen- 
dence, which is not an issue in classical nonparametric regression. Note 
that the dependence between X(t) and X(t') typically becomes stronger 
as \t — 1'\ becomes smaller. Thus, for dense functional data, the within- 
curve dependence constitutes an integral component of the overall rate 
derivation. 
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2. The sparse functional data setting in (a) of Corollary 3.2 was consid- 
ered by Yao, Muller and Wang (2005a) and Hall, Miiller and Wang 
(2006). Actually Yao, Muller and Wang (2005a) assumes that the m^s 
are i.i.d. positive random variables with E(mj) < oo, which implies that 
< 1/E(mj) < E(l/mj) < 1 by Jensen's inequality; this corresponds to 
the case where j n \ is bounded away from and also leads to (3.2). The 
rate in (3.2) is the classical nonparametric rate for estimating a univari- 
ate function. We will refer to this as a one-dimensional rate. The one- 
dimensional rate of /2(i) was eluded to in Yao, Muller and Wang (2005a) 
but was not specifically obtained there. 

3. Hall, Muller and Wang (2006) and Zhang and Chen (2007) address the 
dense functional data setting in (b) of Corollary 3.2, where both papers 
take the approach of first fitting a smooth curve to , 1 < j <mj, for 
each i, and then estimating fj,(t) and R(s,t) by the sample mean and co- 
variance functions, respectively, of the fitted curves. Two drawbacks are: 

(a) Differentiability of the sample curves is required. Thus, for instance, 
this approach will not be suitable for the Brownian motion, which 
has continuous but nondifferentiable sample paths. 

(b) The sample curves that are included in the analysis need to be all 
densely observed; those that do not meet the denseness criterion are 
dropped even though they may contain useful information. 

Our approach does not require sample-path differentiability and all of 
the data are used in the analysis. It is interesting to note that (b) of 
Corollary 3.2 shows that root-n rate of convergence for /I can be achieved 
if the number of observations per sample curve is at least of the order 
(n/logn) 1 / 4 while a similar conclusion was also reached in Hall, Muller 
and Wang (2006) for the smooth-first-then-estimate approach. 

4. Our nonparametric estimators /2, R and V are based local-linear smoothers, 
but the methodology and theory can be easily generalized to higher- 
order local-polynomial smoothers. By the equivalent kernel theory for 
local-polynomial smoothing [Fan and Gijbels (1995)], higher-order local- 
polynomial smoothing is asymptotically equivalent to higher-order kernel 
smothing. Therefore, applying higher-order polynomial smoothing will 
result in improved rates for the bias under suitable smoothness assump- 
tions. The rate for the variance, on the other hand, will remain the same. 
In our sparse setting, if pth order local polynomial smoothing is applied 
under suitable conditions, for some positive integer p, the uniform con- 
vergence rate of fl(t) will become 

t 

where [a] denotes the integer part of a. See Claeskens and Van Keilegom 
(2003) and Masry (1996) for support of this claim in different but related 
contexts. 
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3.2. Convergence rates in cqvariance estimation. The following results 
give the convergence rates for R(s,t) and a 2 . 

Theorem 3.3. Assume that (C1)-(C6) hold. Then 

(3.3) sup \R(s,t)-R(s,t)\ = 0{h 2 l + 5 nl {h^ + h 2 R + 5 n2 {h R )) a.s. 

s,te[a,b] 

Theorem 3.4. Assume that (CI), (C2), (C4), (C6) and (C7) hold. Then 

(3.4) d 2 - a 2 = 0(h 2 R + 5 nl (h R ) + 6 2 n2 (h R ) + h 2 v + 5 2 nl (h v )) a.s. 

We again highlight the cases of sparse and dense functional data. 

Corollary 3.5. Assume that (C1)-(C7) hold. 

(a) Suppose that maxi<j< n rrii < M for some fixed M. Ifh 2 R < h^ < h R , then 

(3.5) sup \R(s,t) -R(s,t)\ = 0(h 2 R + {logn/(nh 2 R )} 1/2 ) a.s. 

s,te[a,b] 

Ifh v + (log n/n) 1 ^ <h R < h v n/\ogn, then 

a 2 -a 2 = 0{h 2 R + {\ogn/(nh R )} 1 ' 2 ) a.s. 

(b) // mini<j< n m; > M n for some sequence M n where M~ l < h R , hy < 
(logn/n) 1//4 , then both sup sig [ a ^\R(s,t) — R(s,t)\ and a 2 — a 2 are 
(^({logn/n} 1 / 2 ) a.s. 

Discussion. 

1. The rate in (3.5) is the classical nonpar ametric rate for estimating a sur- 
face (bivariate function), which will be referred to as a two-dimensional 
rate. Note a 2 has a one-dimensional rate in the sparse setting, while both 
R(s,t) and a 2 have root-n rates in the dense setting. Most of the discus- 
sions in Section 3.1 obviously also apply here and will not be repeated. 

2. Yao, Miiller and Wang (2005a) smoothed the products of residuals instead 
of YijYik in the local linear smoothing algorithm in (2.2). There is some 
evidence that a slightly better rate can be achieved in that procedure. 
However, we were not successful in establishing such a rate rigorously. 

3.3. Convergence rates in FPCA. By (C5), the covariance function has 
the spectral decomposition 

oo 
J'=l 

where u\ > oj 2 > • • • > are the eigenvalues of R(-, ■) and the ^-'s are the 
corresponding eigenfunctions. The vpj's are also known as the functional 
principal components. Below, we assume that the nonzero Wj's are distinct. 
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Suppose R(s, t) is the covariance estimator given in Section 2, and it 
admits the following spectral decomposition: 

oo 

R(s,t) =^%^-(s)$j(i), 

3=1 

where u\ > Q2 > ' * ■ are the estimated eigenvalues and the ^-'s are the corre- 
sponding estimated principal components. Computing the eigenvalues and 
eigenfunctions of an integral operator with a symmetric kernel is a well- 
studied problem in applied mathematics. We will not get into that aspect 
of FPCA in this paper. 

Notice also that ipj(t) and ipj(t) are identifiable up to a sign change. As 
pointed out in Hall, Miiller and Wang (2006), this causes no problem in 
practice, except when we discuss the convergence rate of ipj . Following the 
same convention as in Hall, Miiller and Wang (2006), we let ipj take an 
arbitrary sign but choose ipj such that — is minimized over the two 
signs, where ||/|| := {/ f 2 {t) dt} 1 / 2 denotes the usual L 2 -norm of a function 
feL 2 [a,b]. 

Below let jo be a arbitrary fixed positive constant. 

Theorem 3.6. Under conditions (C1)-(C6), for 1 < j < j : 

(a) u>j-uj j = 0((logn/n) 1 / 2 + h 2 l + h 2 R + 5 2 nl {h^) + 5 2 n2 {h R )) a.s.; 

(b) \\^ j -^ j \\ = 0{hl + 5 n i{h^) + h 2 R + 5 n i{h R ) + 5 2 n2 {h R )) a.s.; 

(c) sup t \^ j (t)-^ j (t)\ = 0(hl + 5 nl (h^ + h 2 R + 5 nl (h R ) + 5 2 n2 (h R )) a.s. 

Theorem 3.6 is proved by using the asymptotic expansions of eigenvalues 
and eigenfunctions of an estimated covariance function developed by Hall 
and Hosseini-Nasab (2006), and by applying the strong uniform convergence 
rate of R(s,t) in Theorem 3.3. In the special case of sparse and dense func- 
tional data, we have the following corollary. 

Corollary 3.7. Assume that (C1)-(C6) hold. Suppose that 
maxi<j<„ wij < M for some fixed M. Then the following hold for all 1 < 

J < jo- 

(a) // (logre/n) 1 / 2 < h^,h R < (logn/n) 1 / 4 then % - ojj = 0{{\ogn/n} 1 / 2 ) 
a.s. 

(b) // hfj, + (logn/n) 1 / 3 <h R <h^ then both of - ^|| and sup t |^-(t) - 
ipj(t)\ have the rate 0(h 2 R + {log n/(n/i/j)} 1 / 2 ) a.s. 

7/mini<j< n mi > M n for some sequence M n where M~ x < h^, h R < (logn/n) 1//4 , 
then, for 1 < j < jo, all of Qj — tOj, W'ipj — tpj\\ and sup t \ ?pj(t) — ipj(t)\ have 
the rate ©({logn/n} 1 / 2 ). 
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Discussion. 

1. Yao, Miiller and Wang (2005a, 2005b) developed rate estimates for the 
quantities in Theorem 3.6. However, they are not optimal. Hall, Miiller 
and Wang (2006) considered the rates of Qj — tOj and \\ipj — ipj\\. The 
most striking insight of their results is that for sparse functional data, 
even though the estimated covariance operator has the two-dimensional 
nonparametric rate, ipj converges at a one-dimensional rate while Qj con- 
verges at a root-ra rate if suitable smoothing parameters are used; remark- 
ably they also established the asymptotic distribution of \\ipj — ^,-||. At 
first sight, it may seem counter- intuitive that the convergence rates of Qj 
and i/jj are faster than that of R, since Qj and tpj are computed from R. 
However, this can be easily explained. For example, by (4.9) of Hall, 
Miiller and Wang (2006), Qj-uJj = JJ(R{s,t)- R(s,t))ij} j (s)'tl) j (t)dsdt + 
lower-order terms; integrating R(s,t) — R(s,t) in this expression results 
in extra smoothing, which leads to a faster convergence rate. 

2. Our almost-sure convergence rates are new. However, for both dense 
and sparse functional data, the rates on Qj — ojj and \\ifjj — are 
slightly slower than the in-probability convergence rates obtained in Hall, 
Miiller and Wang (2006), which do not contain the logra factor at var- 
ious places of our rate bounds. This is due to the fact that our proofs 
are tailored to strong uniform convergence rate derivation. However, the 
general strategy in our proofs is amenable to deriving in-probability con- 
vergence rates that are comparable to those in Hall, Miiller and Wang 
(2006). 

3. A potential estimator the covariance function R(s,t) is 

J n 

i=i 

for some J n . For the sparse case, in view of the one-dimensional uniform 
rate of ipj{t) and the root-n rates of Qj, it might be possible to choose 
J n — > oo so that R(s, t) has a faster rate of convergence than does R(s, t). 
However, that requires the rates of Qj and ipj(t) for an unbounded number 
of j's, which we do not have at this point. 

The proof of the theorems will be given in Section 5, whereas the proofs 
of the corollaries are straightforward and are omitted. 

4. Simulation studies. 

4.1. Simulation 1. To illustrate the finite sample performance of the 
method, we perform a simulation study. The data are generated from the 



12 



Y. LI AND T. HSING 



following model: 

3 

Yij = Xi {Tij ) + Uij with Xi (t) =n(t) + ^2 iik^j (t) , 

k=l 

where ~ Uniform [0, 1], Cik ~ Normal (0,Wj) and Uij ~ Normal(0, a 2 ) are 
independent variables. Let 

M (i) = 5(t - 0.6) 2 , ViW = l, 

^ 2 (i) = v / 2sin(27Tt), -03(0 = \/2cos(27rt) 

and (wi,o;2,W3,ct 2 ) = (0.6,0.3,0.1,0.2). 

We let n = 200 and mi = in for all i. In each simulation run, we generated 
200 trajectories from the model above, and then we compared the estimation 
results for m = 5, 10, 50 and oo. When m = oo, we assumed that we know 
the whole trajectory and so no measurement error was included. Note that 
the cases of m = 5 and m = oo may be viewed as representing sparse and 
complete functional data, respectively, whereas those of m = 10 and m = 50 
represent scenarios between the two extremes. For each m value, we esti- 
mated the mean and covariance functions and used the estimated covariance 
function to conduct FPCA. The simulation was then repeated 200 times. 

For m = 5, 10, 50, the estimation was carried out as described in Section 2. 
For m = oo, the estimation procedure was different since no kernel smoothing 
is needed; in this case, we simply discretized each curve on a dense grid, then 
the mean and covariance functions were estimated using the gridded data. 

Notice that m = oo is the ideal situation where we have the complete 
information of each curve, and the estimation results under this scenario 
represent the best we can do and all of the estimators have root-n rates. 
Our asymptotic theory shows that m->oo as a function of n, and if m 
increases with a fast enough rate, the convergence rates for the estimators 
are also root-n. We intend to demonstrate this based on simulated data. 

The performance of the estimators depends on the choice of bandwidths 
for n(t), C(s,t) and V(t), and the best bandwidths vary with m. The band- 
width selection problem turns out to be very challenging. We have not come 
across a data-driven procedure that works satisfactorily and so this is an 
important problem for future research. For lack of a better approach, we 
tried picking the bandwidths by the integrated mean square error (IMSE); 
that is, for each m and for each function above, we calculated the IMSE over 
a range of h and selected the one that minimizes the IMSE. The bandwidths 
picked that way worked quite well for the inference of the mean, covariance 
and the leading principal components, but less well for a 2 and the eigenval- 
ues. After experimenting with a number of bandwidths, we decided to used 
bandwidths that are slightly smaller than the ones picked by IMSE. They 
are reported in Table 1. Note that undersmoothing in functional principal 
component analysis was also advocated by Hall, Miiller and Wang (2006). 
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Table 1 
Bandwidths in simulation 1 







flR 


hv 


m — 5 


0.153 


0.116 


0.138 


m = W 


0.138 


0.103 


0.107 


m = 50 


0.107 


0.077 


0.084 



The estimation results for //(•) are summarized in Figure 1, where we 
plot the mean and the pointwise first and 99th percentiles of the estimator. 
To compare with standard nonparametric regression, we also provide the 
estimation results for /i when m = 1; note that in this case the covariance 
function is not estimable since there is no within-curve information. As can 
be seen, the estimation result for m = 1 is not very different from that 
of m = 5, reconfirming the nonparametric convergence rate of // for sparse 
functional data. It is somewhat difficult to describe the estimation results 
of the covariance function directly. Instead, we summarize the results on 
ipk(') and ojk in Figure 2, where we plot the mean and the pointwise first 
and 99th percentiles of the estimated eigenfunctions. In Figure 3, we also 



m = l m = 5 m = 10 




0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 

t t 



Fig. 1. Estimated mean function in simulation 1. In each panel, the solid line is the 
true mean function, the dashed line is the pointwise mean and the two dotted lines are 
the pointwise 1% and 99% percentiles of the estimator of the mean function based on 200 
runs. 
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0.0 0.2 0.4 0.6 0.8 LO 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 



t t t t 

Fig. 2. Estimated eigenfunctions in simulation 1. In each panel, the solid line is the 
eigenfunction, the dashed line is the pointwise mean and the two dotted lines are the 
pointwise 1% and 99% percentiles of the estimator of the eigenfunction in 200 runs. The 
three rows correspond to ipi , ip2 and 7^3 / different columns correspond to different m values. 

show the empirical distributions of 0^ and a 2 . In all of the scenarios, the 
performance of the estimators improve with m; by m = 50, all of the the 
estimators perform almost as well as those for m = oo. 

4.2. Simulation 2. To illustrate that the proposed methods are applica- 
ble even to the cases that the trajectory of X is not smooth, we now present 
a second simulation study where X is standard Brownian motion. Again, we 
set the time window [a, b] to be [0, 1]. It is well known that the covariance 
function of X is R(s, t) = min(s, t),s,t£ [0, 1], which has an infinite spectral 
decomposition with 

Wfe = 4/{(2/c-l)V}, ip k (t) = v / 2sin{(A;- l/2)vrt}, k = l,2,.... 

Again, let the observation times be Ty ~ Uniform[0, 1], = Aj(Tjj) + Uy, 
Uij ~ Normal(0, a 2 ). We let a 2 = 0.1 2 , which is comparable to U3. 

Since X is not differentiable with probability one, smoothing individual 
trajectories is not sensible even for large m values. Also, R(s,t) is not differ- 
entiable on the diagonal {s = t}, and therefore the smoothness assumption 
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O 

o 



O 



m = 5 m = 10 m = 50 m — oc 

o 



o 

O I ^ I ! ' 

o I . 1 ^^^^ 

O- , 

o I ' 

m = 5 m = ID m = 50 rn = oc 

Fig. 3. Box plots for u%, L02, ^3 and a 2 in simulation 1. 

in our theory is not satisfied. Nevertheless, as we will show below, the pro- 
posed method still works reasonably well. The reason is that the smoothness 
assumption on R(s,t) in our theory is meant to guarantee the best conver- 
gence rate for the R(s,t). When the assumption is mildly violated, the esti- 
mator may still perform well overall but may have a slower convergence rate 
at the nonsmooth points. A similar phenomenon was observed in Li et al. 
(2007), which studied kernel estimation of a stationary covariance function 
in a time-series setting. 

We set n = 200 and m = 5, 10 or 50 in our simulations. The estimation 
results for the first three eigenfunctions are presented in Figure 4. Again, we 
plot the mean and the pointwise first and 99th percentiles of the estimated 
eigenfunctions. As can be seen, it is in general much harder to estimate 
the higher-order eigenfunctions, and the results improve as we increase m. 
The empirical distribution of the estimated eigenvalues as well as a 2 are 
summarized in Figure 5. The estimated eigenvalues should be compared 
with the true ones, which are (0.405,0.045,0.016). When m is large, the 
estimated eigenvalues are very close to the true values. 

5. Proofs. 

5.1. Proof of Theorem 3.1. The proof is an adaptation of familiar lines 
of proofs established in nonparametric function literature; see Claeskens 
and Van Keilegon (2003) and Hardle, Janssen and Serfling (1988). For 




16 



Y. LI AND T. HSING 



m — 5 m — 10 m — 50 




0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 

t t t 



Fig. 4. Estimated eigenfunctions in simulation 2. In each panel, the solid line is the 
eigenfunction, the dashed line is the pointwise mean and the two dotted lines are the 
pointwise 1% and 99% percentiles of the estimator of the eigenfunction in 200 runs. The 
three rows correspond to ipi , ip2 and ips; different columns correspond to different m values. 

simplicity, throughout this subsection, we abbreviate hn as h. Below, let 
t± A *2 = min(ti,t2) and t\ V *2 = max(ti,t2). Also define K^(t) = t e K(t) 
and K h:{e) (v) = (l/h)K {e) (v/h). 

Lemma 1. Assume that 

(5.1) E( sup \X(t)\ x ) <oo and E\U\ x <oo for some X E (2, oo). 
He[a,b] ' 

Let = Xi(Tij) or Uij for 1 < i < n, 1 < j < mi. Let c n be any positive se- 
quence tending to and (3 n = c 2 n + c n /j n i. Assume that /3~ 1 (log?i/?i) 1 ~ 2 / A = 
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0J\ CO UJ2 




o 



m. = 5 m = 10 m = 50 m = 5 m = 10 m = 50 

FlG. 5. i?o:r p/o£s /or oj\, Q2, £3 and a 2 in simulation 2. 



o(l). Let 




and 



V n (t,c) = sup\G n {t,t + u) -G(t,t + u)\, c>0. 

|u|<c 

Then 

(5.3) sup V n (t,c n ) = 0(n- 1 / 2 {(3 n logn} 1 / 2 ) a.s. 

te[a,b] 

Proof. We can obviously treat the positive and negative parts of 
separately, and will assume below that is nonnegative. Define an equally- 
spaced grid := {v k }, with v k = a + kc n , for k = 0, . . . , [(b — a)/c n ], and 
U[(&-a)/cn]+i = where [•] denotes the greatest integer part. For any t £ [a, b] 
and I it| < c n , let v k be a grid point that is within c n of both t and i + u, 
which exists. Since 

\G n (t,t + u)-G(t,t + u)\ < \G n (v k ,t + u)-G(v k ,t + u)\ 

+ \G n (v k ,t) - G(v k ,t)\, 
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we have 

\G n (t,t + u) - G(t,t + u)\ <2supV n (t,c n ). 

Thus, 

(5.4) sup V n (t,Cn,) < 2 sup V n (t,Cn). 

te[a,b] t&S 

From now on, we focus on the right-hand side of (5.4). Let 

(5.5) a n = re~ 1/2 {/3 n logra} 1/2 and Q n = Pn/a n , 

and define G* (ti,t2), G*(ti,t2) and V*(t,c n ) in the same way as G n (ti,t2), 
G(t\,t2) and V n (t,c n ), respectively, except with 3fijl{3fij < Q n ) replacing 
3fij. Then 

(5.6) supV n (t,c n ) < supV*(t,c n ) + Ani + A n2 , 
where 

A„i=sup sup (G n (t,t + u) — G* n (t, t + u)), 

t&0 \u\<c n 

A n2 = sup sup (G(t,t + u) -G*(t,t + u)). 

t&g \u\<c n 

We first consider A n \ and A n2 . It follows that 

(5.7) a~ 1 Qi- A = {^(logn/n) 1 -^ 2 = o(l). 
For all t and u, by Markov's inequality, 

a- x {G n {t,t + u)-G* n {t,t + u)) 

1 n { 1 mi 1 

i=i K. % j=i ) 
n ( -. m« 

< <^n- A - E - E > ^) 

i=l L 1 j=l J 

Consider the case i^j = Xj(Tjj), the other case being simpler. It follows that 



1 mt 

— V 2$ < Wi where Wi = sup | X t (t ) | ; 
m i ~t *e[o,6] 
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Thus, 

n 

(5.8) a-\G n (t, t + u)- G* n (t, t + u))< a7 n l Q l ~ x - £ W { . 

i=l 

By the SLLN, n~ l £ ? n =1 W t ^ E(sup te[0j6] \X{t)\ x ) < oo. By (5.7) and (5.8), 
a^Ani 0. By (5.7) and (5.8) again, a~ l A n 2 = 0, and so we have proved 

(5.9) lim (A n i + A n2 ) = o(a n ) a.s. 

n— ¥oo 

To bound V*(t,c n ) for a fixed t £ £f , we perform a further partition. Define 
= [QnC n /a n + 1] and u r = rc n /w n , for r = + 1, . . . ,w n . Note 

that 67* (t,t + u) is monotone in \u\ since > 0. Suppose that < u r < u < 
u r +i. Then 

G* n {t,t + u r ) -G*(t,t + u r ) + G*(t,t + u r ) - G*(t,t + u r+1 ) 

< G* n {t,t + u) -G*(t,t + u) 

< G* n (t, t + u r+1 ) - G*(t, t + u r+ i) + G*(t, t + u r+l ) - G*(t, t + u r ), 
from which we conclude that 

\G* n {t,t + u) - G*(t,t + u)\ < max(£ n r,£n,r+i) + G*(t + u r ,t + u r+ \), 
where 

Z nr = \G* n (t,t + u r )-G*(t,t + u r )\. 
The same holds if u r < u < u r +i < 0. Thus, 

V*(t,c n )< max ( nr + max G*(t + u r , t + u T+ i). 

-w n <r<w n -w n <r<w n 

For all r, 

G*(t + Ur, t + Ur+l) < QnV(t + U r <T<t + U r+l ) 
< M T Qn(Ur+l ~ Ur) < M T CL n . 

Therefore, for any B, 

(5.10) P{V*(t,Cn)>Ba n }<v{ max £ rer > (B - M T )a n \ . 

Now let Zi = mr 1 ^ 1 2%I( 1,, < Q re )I(T ij G (t, t + u r ]) so that £ nr = \ ± x 
££=l{^i - E (^)ll- We have \ Z i - E (^)l < Qn, and 

n n n 

^var(Zi) < J^EZf < M £(c£ + cjm;) < Mn/3 n 

i=l i=l i=l 
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for some finite M. By Bernstein's inequality, 



P{6w >(B- M T )a n } < exp 



(B - M T ) 2 n 2 a 2 n 



n na n } 



< 



exp 



2£r=ivar(Z i ) + (2/3) (5 - M T )Q 
(B - M T ) 2 n 2 a 2 n 



where B* 



2Mnf3 n + (2/3) (B - M T )n(3 n 
2M+(2/3)(B-M T ) ■ B y ( 5 - 10 ) and Boole ' s inequality, 



< n 



-B* 



(B-M T f 



»\suvV*(t,c n )>Ba n }< 



b — a 



+ 1 



QnC-n 



1 



for some finite C. Now Q n /a n = fin/a^ = n/logn. So F{V*(t,c n ) > Ba n } 
is summable in n if we select B large enough such that B* > 2. By the 
Borel-Cantelli lemma, 



(5.11) 



supV*(t, Cn) = 0(a n ) a.s. 



Hence, (5.3) follows from combining (5.4), (5.6), (5.9) and (5.11). □ 



Lemma 2. Let be as in Lemma 1 and assume that (5.1) holds. Let 
h = h n be a bandwidth and let f3 n = h 2 + /i/7ni- Assume that h — > and 
/3~ 1 (logn/n) 1 ~ 2 / A = o(l) For any nonnegative integer p, let 

I n ' I m i 
Dp, n (t) = - — Yl K h,(p) ( T ij ~ l )^3 



=1 L 1 j=l 



Then we have 



sup ^/nh 2 /(f3 n logn)\D p , n (t) - E{D p , n (t)}\ = O(l) 

t£[a,b] 



a.s. 



Proof. Since both K and t p are bounded variations, is also a 

bounded variation. Thus, we can write = — -^(p),2 where K( p ),i 

and K(p),2 are both increasing functions; without loss of generality, assume 
that iT( p ) jl (— 1) = K( p j 2 (— 1) = 0. Below, we apply Lemma 1 by letting c n = 
2h. It is clear that the assumptions of Lemma 1 hold here. Write 
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a,(p)W 



ph i n ( i m i "| 

= / G n (t + u,t + /i)d*f ft>(p) (t;), 

where G n is as defined in (5.2). We have 
sup \D p , n (t)-M{D Ptn (t)}\ 

te[a,b] 

(5.12) < sup V n (t,2h) f \dK h>(p) \ 

te[a,b] J-h 

<{K (p) ,i{l) + K {p) ^{l)}h~ l sup V n (t,2h), 

te[a,b] 

and the conclusion of the lemma follows from Lemma 1. □ 

Proof of Theorem 3.1. Define 

R*. = R r - n(t)S r - hfi w (t)S r+1 . 
By straightforward calculations, we have 

■^0^2 — -KjOi 



(5.13) 



where 50,51,52 are defined as in (2.1). Write 



i L 1 j=l 

= ^E ^-E^(^-t){(^-w 

i I 1 j=l 

x { £ij + ii{T i3 ) ~ M(t) " ^ (1) W(^ - t)} 
By Taylor's expansion and Lemma 2, uniformly in t, 

(5.14) 1$ = ^E^E^^i -*M(^i -f)//»} r ey + 0(/» 2 ), 

and it follows from Lemma 2 that 

(5.15) R* = 0(h 2 + 6 nl (h)) a.s. 
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Now, at any interior point t G [a + h,b — h], since / has a bounded derivative, 
E{5 } = ^ #(«)/(* + hv) dv = f(t) + O(h), 

E{S l } = 0(h), K{S 2 } = f(t)u 2 + 0(h), 

where v 2 = J v 2 K{v) dv. By Lemma 2, we conclude that, uniformly for t G 
[a + h, b — h], 

So = f(t) + 0(h + S nl (h)), S l = 0(h + 5 nl (h)), 

(5.16) 

S 2 = f(t)v 2 + 0{h + 5 nl (h)). 

Thus, the rate in the theorem is established by applying (5.13). The same 
rate can also be similarly seen to hold for boundary points. □ 

5.2. Proofs of Theorems 3.3 and 3.4- 

Lemma 3. Assume that 

(5.17) e( sup \X(t)\ 2X ) <oo and E|[/| 2A <oo for some A G (2,oo). 

\e[a,b] ' 

Let ^ijk be X(Tij)X(Tik), X(Tij)Uik or UijUik- Let c n be any positive se- 
quence tending to and j3 n = + c n j^ n \ + c 2 /7 n 2. Assume that 
/3- 1 (logn/n) 1 " 2 / A = o(l). Let 

G n (si,ti,s 2 ,t 2 ) 
1 n ( 1 

(5.18) = - — V 3?ijkI(Tij G [si A s 2 ,si V s 2 ], 

r<fcG[tiAt2,t 1 Vt 2 ])J, 

G(si,ti,s 2 ,t2) =E{G n (si,t 1 ,s 2 ,t 2 )} and 

V n (s,t,5)= sup |G n (s,t,s + ni,t + u 2 ) - G(s,t,s + ui,t + u 2 )\. 

\ui\,\u2\<S 

Then 

sup K(s,t,c n ) = 0(n- 1 / 2 {^ n logn} 1 / 2 ) a.s. 

s,t£[a,b] 

Proof. The proof is similar to that of Lemma 1, and so we only outline 
the main differences. Let a n ,Q n be as in (5.5). Let be a two-dimensional 
grid on [a,b] 2 with mesh c n , that is, £f = {{vk 1 ,Vk 2 )} where Vk is defined as 
in the proof of Lemma 1. Then we have 

(5.19) sup V n (s,t,Cn) <4 sup V n (s,t,Cn). 

s,te[a,b] {s,t)&S 
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Define G n (si,ti,S2,t2),G* (si,ti, S2,t2) and V*(s,t,5) in the same way as 
G n (s 1 ,t 1 ,S2,t 2 ), G(si,h, s 2 ,t 2 ) and V n (s,t,8) except with %j k l(%j k < Q n ) 
replacing J^fc. Then 



(5.20) 
where 



sup V n (s,t,c n )< sup V*(s,t,c n ) + A nl + A n2 , 



sup sup \G n (s,t,s + ui,t + u 2 ) -Gn(s,t,s + ui,t + U2) 

(s,t)&# |til|,|«2|<Cn 



A n 2 = sup sup \G(s,t,S + Ul,t + U2) - G* (s,t, S + U\,t + u 2 )\. 

{s,t)&? \u\_\,\u2\<c n 

Using the technique similar to that in the proof of Lemma 1, we can show 
Ani and A n 2 is o(a n ) almost surely. To bound V*(s,t,c n ) for fixed (s,t), 
we create a further partition. Put w n = [QnCn/o-n + 1] and u r = rc n /w n ,r = 
-w n ,...,w n . Then 

V n (s,t,c n ) < max £n,ri,7*2 
—w n <ri,r2<w n 

+ max {G* (s,t, s + u ri+ i,t + u r2+ i) 

-w n <ri,r2<w n 



G*(s,t,S + U ri ,t + U r2 )}, 



where 



1-2 ■ 



It is easy to see that var(£ n]ri]r2 ) < Mn(3 n for some finite M, and the rest of 
the proof completely mirrors that of Lemma 1 and is omitted. □ 

Lemma 4. Let 3?ijk be as in Lemma 3 and assume that (5.17) holds. 
Let h = h n be a bandwidth and let (3 n = h 4 + /i 3 /7„,i + /i 2 /7"2- Assume that 
h — > and /3~ 1 (logn/n) 1 ~ 2 / A = o(l). For any nonnegative integers p,q, let 



1 n \ 1 

Dp^ n {s,t) = -y jy7 X/ ^ijkKh,{p){Tij - s)K h ^(T ik - t) 

Then, for any p, q, 

sup \/n/iV(/*3nlogn)|.Dp i9iri (s,t) - E{D P)?in (s,t)}| =0(1) a.s. 
s,te[a,6] 

Proof. Write 

n r j 

= Yl 77 < s + h)I(T lk <t + h) 

7=1 L iV * ^ 
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x Kh,(p)(T tJ - s)K hAq) {T lk - t) 



i n r 1 

~Y1 w^^ ijk 



n *~! l N i fc#j 

x I(Tij £ [s + u,s + h]) 
xI(T ik e[t + v,t + h}) 



dK ht( p)(u) dK hM) (v) 



G n (s + u,t + v, s + h,t + h) dK h , {p) (u)dK h . {q) (v), 



' (u,v)G[-h,h] 2 

where G n is as in (5.18). Now, 

SUp \D ptq>n (s,t) -E{D pt q !n (s,t)}\ 
(s,t)€[a,b] 2 



< sup V n (s,t,2h) / / \d{K hi(p) (u)}\\d{K h>{q) {v)}\ 
s,te[a,b] J J (u,v)e[-h,h] 2 

= 0[{p n \ogn/(nh A )} l l 2 ] a.s. 
by Lemma 3, using the same argument as in (5.12). □ 

PROOF of Theorem 3.3. Let S pq , R pq , ^ and SB be defined as in (2.3). 
Also, for p, q > 0, define 

R* pq = R pq - C(s,t)S pq - h R C^ \s,t)S p+1 , q - hRC^is^S^+L 

By straightforward algebra, we have 

(5.21) (6 - C)(s,t) = (4i?So - ^R* w - *f 3 R5 1 )&- 1 . 

By standard calculations, we have the following rates uniformly on [a + 
h R ,b- h R f: 

E(Soo) = f(s)f(t) + 0(h R ), E(5 i) = 0(h R ), 
E(5i ) = 0(h R ), E(5 02 ) = f(s)f(t)u 2 + 0(h R ), 
E(5 20 ) = f(s)f{t)v 2 + 0(h R ), E(Sn) = 0(h R ). 
By these and Lemma 4, we have the following almost sure uniform rates: 

M = f\s)f(t)ui + 0(h R + S n2 (h R )), 

£f 2 = 0(h R + 6 n2 (h R )), 

(5.22) 

^3 = 0(h R + 5 n2 (h R )), 
® = f\s)f\t)v 2 2 + Q(h R + 5 n2 (h R )). 
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To analyze the behavior of the components of (5.21), it suffices now to 
analyze R* q . Write 



1 n T 1 

^ = -E ttE^^-^*) 



n ^ Ni 

i=i k^j 

- C^is^Tik ~ *)} x K hR {l\j - s)K hR (T ik - t) 
Let £ ijk = Y ij Y ik ~ C(Tij,T ik ). By Taylor's expansion, 

Y i3 Y ik - C(s,t) - C^isMTij -s)- C^(s,t)(T lk - t) 
= Y i3 Y ik - C(s, t) - C{T i:i ,T ik ) + C(T^T ik ) 

- (s, t) (Tij -s)- (s, t) (T lk - t) 
= £ *jk + 0(h%) a.s. 

It follows that 

n 1 

(5.23) = - E ^E4A(^i " s)K hR (T ik -t) + 0{h\). 

i=i 1 kjtj 

Applying Lemma 4, we obtain, uniformly in s,t, 

(5.24) Ro = 0(5 n2 (h R ) + hl) a.s. 
By (5.22), 

(5.25) sty®- 1 = [/(s)/^)]" 1 + 0(h R + MM)- 

Thus, Rqq-s/i^ -1 = 0((5„2(/i_r) + h R ) a.s. Similar derivations show that 
R* 10 £?<2 x m~ x and R* m £^z 2% are both of lower order. Thus, the rate in (3.3) 
is obtained for s, t 6 [a + h R , b — h R ]. As for s and/or t in [a,a + h) U (b— h, b], 
similar calculations show that the same rate also holds. The result follows 
by taking into account of the rate of /2. □ 

Proof of Theorem 3.4. Note that 

a 2 -a 2 = ^— f {V(t)-V(t)}dt--!— [ b {C(t,t)-C{t,t)}dt. 
b-aj a b-aj a 

To consider V(t) — V(t) we follow the development in the proof of Theo- 
rem 3.1. Recall (2.4) and let Q* = Q r - V(t)S r - hV^\t)S r+1 . Then, as 
in (5.13), we obtain 

Qo$2 — Q*Si 



V{t) - V(t) 



S0S2 — S 1 
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Write 



^ = ^E^E K *v ( r « ~ *H( r « " WvYiYS - V(t) - F« ( t )(T l3 - t)} 

i=l 3=1 
-i n i m « 

= - E — E KhvFv ~ *MCty - Wttg - vCty)} + 

1=1 1 j=l 

which, by Lemma 1, has the uniformly rate 0(hy + 5 n i{hy)) a.s. By (5.16), 
we have 

1 1 mi 

v{t) ~ v{t) = TT^w E - E K *V*t ~ - nr« )} 

7 W i 1 j=l 

+ 0{h 2 v + 5 2 nl {h v )) a.s. 

Thus, 

c6 



/ {V(t)-V(t)}dt = -Y^ — Y^{Y 2 -V(T l] )}f K hv (T i3 -t)r\t)dt 

J a n i=1 m i . =1 A 

+ 0(/# + a* 1 (fcy)) a.s. 



Note that 

eft 



K hv (T i3 -t)r\t)dt 



< sup /-^i) 



By Lemma 5 below in this subsection, 

(5.26) / {V(t)-V{t)}dt = 0{(logn/n) 1 / 2 + h 2 v + 5 2 nl {h v )) a.s. 



Next, we consider C(t,t) — C(t,t). We apply (5.21) but will focus on 
Rqq£?i&~ 1 since the other two terms are dealt with similarly. By (5.23)- 
(5.25), 

1 " 1 

= E W t E 4A(^ " s ) K hn(Tik - 1) 

(5 ' 27) +0(^ + <^j(M) a - s - 

Thus, 

{C(i,i)-C(i,i)}eft 
1 A 1 



E atE4* / K hR {T ij -t)K hR {T ik -t)r 2 {t)dt 

i=l iv * J* 

0(h 2 R + 5 2 n2 {h R )) a.s. 
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■J a 



sup 

ue[o,i] 



Write 

K hR (Tij - t)K hR (T ik - t)f~ 2 (t) dt 

= J K(u)K hR ((T ik - Tij) +uh R )f~ 2 (Tij - uh R )du. 

A slightly modified version of Lemma 1 leads to the "one-dimensional" rate: 
1 - 1 

n Yl J^J2 £ ijk K h R ((Tik ~ Tij) + uh R )f- 2 (T i:j - uh R ) 
i=l 1 kj^j 

= 0(5 n i(h R )) a.s. 
It follows that 

(5.28) / {C(t,t)-C(t,t)}dt = 0(h 2 R + 5 nl (h R )+5 2 n2 (h R )) a.s. 

J a 

The theorem follows from (5.26) and (5.28). □ 

Lemma 5. Assume that £ n j, 1 < i < n, are independent random variables 
with mean zero and finite variance. Also assume that there exist i.i.d. random 
variables £j with mean zero and finite 5th moment for some 5 > 2 such that 
|£m| < Then 



1 n 

-Y j U = 0((\ogn/n) 1 ' 2 ) a.s. 



i=i 

Proof. Let a n = (logn/n) 1 / 2 . Assume that £ n j > 0. Write 

Cni = £,ni>- 4" £ni^ • = Cni-^(|Cra| ^ a n ) 4 £ni-^(|£ni| — 0, n )■ 

Then 

1 n 1 n i n 

a n n f-f a n n ^ 

2=1 1 = 1 i = l 

by the law of large numbers. The mean of the left-hand side is also tending to 
zero by the same argument. Thus, n _1 X^iLi (£ni>- ~~ JE{£mx}) = °( a n)- Next, 
by Bernstein's inequality, 



\ i=i 



- E{£ ni ^}) > Ba n < exp 



2ncx 2 + (2/3)5re 

J £> 2 logn 1 
ex P\ " 2a 2 + (2/3)5 /' 



which is summable for large enough £>. The result follows from the Borel- 
Cantelli lemma. □ 
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5.3. Proof of Theorem 3.6. Let A be the integral operator with kernel 
R — R. 

Lemma 6. For any bounded measurable function ift on [a,b], 
sup \(A^)(t)\=0(hl + 6 nl (h IJ ,) + h 2 R + 5 nl (h R ) + Sl 2 (h R )) a.s. 

t£[a,b] 

Proof. It follows that 

(Ai/>)(t) = f {C-C){s,t)i>{s)ds- [ {/2(s)m-Ks)Kt)Ms)ds 

J s=a J s=a 

= '■ A n i — A n2 . 



By (5.21), 



A nl = I (Mi2oo-^io--^3i2oi)^"V(s)ds. 

J s=a 



We focus on / a=a M-RSo^"V(») ds 

since the other two terms are of lower 
order and can be dealt with similarly. By (5.23) and (5.25), 



b 

* f72>— 1 



= ?LEfE4A(^-*) f K hR {T\j-8)il>{s)f{8)- x ds 

J{W i=1 ^i k ^. J s =a 

+ 0(h 2 R + 5 2 n2 (h R )). 



Note that 



b 



K hR (T ij -s)ijj{s)f{s)- 1 ds 



< sup ms)\f{s)- x ) / K(u)du. 

s£[a,b] Ju=—1 



Thus, Lemma 1 can be easily improvised to give the following uniform rate 
over t: 



% ' n i= i iVi Js=a 
= 0(S n i(h R )) a.s. 



Thus, 

<-b 



srf x R* m m 1 iP(s)ds = 0{5 nl (h R ) + h R + 8 2 l2 (h R )) a.s., 
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which is also the rate of A n \. Next, we write 

/b rb 
{fl(s) - (i(s)}ifj(s) ds - - fi(t)} / ti(s)ip(s) ds, 
_=a J s=a 

which has the rate OQvi + S n i(hfi)) by Theorem 3.1. □ 

PROOF of Theorem 3.6. We prove (b) first. Hall and Hosseini-Nasab 
(2006) give the L 2 expansion 

ft - 4>j = E( a j " h^iWiMfa + 0(11 A|| 2 ), 

where || A|| = (// {R(s, t) - R(s, t)} 2 ds dt) 1/2 , the Hilbert-Schmidt norm of A. 
By Bessel's inequality, this leads to 

||^-^-||<C(||A^-|| + ||Af). 

By Lemma 6 and Theorem 3.3, 

||A^|| = 0(/i 2 + MM + h\ + 5 nl (h R ) + 8 2 n2 (h R )), 

|| A|| 2 = 0(hi + S 2 nl (h,) + 4 + 5 2 n2 (h R )) a.s. 

Thus, 

lift =0(h 2 l + 5 n i{h fl ) + h R + 6 nl (h R ) + Sl 2 {h R )) a.s., 

proving (b). 

Next, we consider (a). By (4.9) in Hall, Midler and Wang (2006), 
(R- R)(s,t)i; j (s)ilj j (t)dsdt + 0(\\Aij j \\ 2 ) 



(C -C){s,t)il)j{s)il}j{t)dsdt 

{/2(s)/I(t)-/x( S )/i(t)}^(s)^(t)^^ + 0(||A^|| 2 ) 
= :A nl -A n2 + 0(\\A^\\ 2 ). 

Now, 

Anl= II ^ lR * 00 ~ ^ 2R *° ~ ^RmW^jis^ji^dsdt. 
Again it suffices to focus on JJ £/iRQ ^~ l Tpj(s)i/jj(t) ds dt. By (5.23) and (5.25), 
&/ 1 Ely @~ 1 il>j(s)il) j (t) dsdt 

1 1 £ 4* / / K*. (T„ - ,)K hR { T it - t) 

i — 1 ^r^3 

x^{s)^{t){f{s)f{t)}~ l dsdt 



+ 0{h 2 R + 5 2 n2 {h R )) a.s., 
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where the first term on the right-hand side can be shown to be (3((log /n) 1 / 2 ) 
a.s. by Lemma 5. Thus, 

A nl = OUlog/n) 1 / 2 + h\ + 8 2 n2 (h R ) ). 

Next, write 

A n2 = {fi(s) - n(s)}ipj(s)ds / fiitfipjitfdt 



+ J n(s)i; j (s)ds /{//(t)-/x(t)}^(t)dt, 

and it can be similarly shown that 

A n2 = 0((log /nfl 2 + hl + 5 2 nX {h^)) a.s. 

This establishes (a). 

Finally, we consider (c). For any t G [a, b], 

ujjtl>j(t) — 

R(s,t)ipj(s) ds — / R(s,t)ifjj(s) ds 



{R(s,t)-R(s,t)}i/}j(s)ds+ / R(s,t){tpj(s) -ipj(s)}ds 



By the Cauchy-Schwarz inequality, uniformly for all t £ [a,b], 

R(s,t){i; j (s)-tp j (s)}ds < R 2 {s,t)ds^ \$j-il>j\\ 

< |6-a| 1/2 sup|^(s,t)| x \$j-ipj\\ 

s,t 

= 0(ft--Vill) a.s. 

Thus, 

ujji>j(t) - ujjipj(t) = <3(/i 2 +S n i(h lJl ) + h 2 R + 5 nl (h R ) + 5 2 l2 (h R )) a.s. 

By the triangle inequality and (b), 

COj$j(t) -tpj(t)\ 

= \u}jipj(t) — ojjipj(t) — (Qj — ujj)t/jj(t)\ 

< \Qjipj(t) - ujji/jj(t)\ + \Qj - sup\ip j(t)\ 

t 

= 0((log n/n) 1 ' 2 + hf l + 5 nl (hp) +h\ + 6 nX (h R ) + 5 2 2 (h R )) a.s. 

Note that (logn/n) 1 / 2 = o(5 n \(hp)). This completes the proof of (c). □ 
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